10. Mediapipe gesture control robotic arm action group10.1. Introduction10.2. Using10.4. Core files10.4.1、mediaArm.launch10.4.2、 FingerCtrl.py10.5. Flowchart
MediaPipe is a data stream processing machine learning application development framework developed and open sourced by Google. It is a graph-based data processing pipeline for building data sources that use many forms, such as video, audio, sensor data, and any time series data. MediaPipe is cross-platform and can run on embedded platforms (Raspberry Pi, etc.), mobile devices (iOS and Android), workstations and servers, and supports mobile GPU acceleration. MediaPipe provides cross-platform, customizable ML solutions for real-time and streaming.
Note: [R2] of the remote controller has the function of [pause/on] for this gameplay.
The case in this section may run very slowly on the robot master. It is recommended to connect the camera on the virtual machine side and run the file[02_PoseCtrlArm.launch]. The NX master control will work better, you can try it.
roslaunch arm_mediapipe mediaArm.launch # Robots
rosrun arm_mediapipe FingerCtrl.py # The robot can also be started in a virtual machine, but the virtual machine needs to be equipped with a camera
After the program is running, press the handle's R2 key to touch the control. The camera will capture the image, there are six gestures, as follows
Here, when each gesture is finished, it will return to the initial position and beep, waiting for the next gesture recognition.
MediaPipe Hands infers the 3D coordinates of 21 hand-valued joints from a frame.
xxxxxxxxxx
<launch>
<include file="$(find yahboomcar_ctrl)/launch/yahboom_joy.launch"/>
<include file="$(find yahboomcar_bringup)/launch/yahboomcar.launch"/>
<node pkg="web_video_server" type="web_video_server" name="web_video_server" output="screen"/>
<node name="msgToimg" pkg="arm_mediapipe" type="msgToimg.py" output="screen" required="true"/>
</launch>
xxxxxxxxxx
The implementation process here is also very simple. The main function opens the camera to obtain data and then passes it into the process function. Inside it, "detect palm" -> "obtain finger coordinates" -> "obtain gestures" in sequence, and then decide what needs to be done according to the gesture results action performed frame, lmList, bbox=
self.hand_detector.findHands(frame) #detect palm
fingers = self.hand_detector.fingersUp(lmList) #get finger coordinates
gesture = self.hand_detector.get_gesture(lmList) #get gesture
For the specific implementation process of the above three functions, you can refer to the content in media_library.py
The implementation process here is also very simple. The main function opens the camera to obtain data and then passes it into the process function. Inside it, "detect palm" -> "obtain finger coordinates" -> "obtain gestures" in sequence, and then determine the needs according to the gesture results action performed.